Technical SEO

Google Revamps Crawler Documentation with New Structure and Enhanced Content Google has unveiled a major revamp of its Crawler documentation, streamlining the main overview page and dividing the content into three more focused pages. Despite the changelog downplaying the changes, a new section has been added and the entire crawler overview page has essentially been rewritten. The additional pages allow Google to enhance the information density of all the crawler pages and improve topical coverage. What’s New in the Documentation? While Google’s documentation changelog notes two changes, there is actually much more encompassed in this update. Here are some of the key changes: – Updated the user agent string for the GoogleProducer crawler. – Included content encoding information. – Introduced a new section about technical properties. The technical properties section includes completely new information that was not previously available. There are no changes to crawler behavior, but by creating three specialized pages, Google can add more information to the crawler overview page while simultaneously making it more concise. This update also provides new details about content encoding (compression): Google’s crawlers and fetchers support gzip, deflate, and Brotli (br). The supported content encodings for each Google user agent are listed in the Accept-Encoding header of each request they make. For example, Accept-Encoding: gzip, deflate, br. Additional information is supplied about crawling over HTTP/1.1 and HTTP/2, alongside a statement about the goal of crawling as many pages as possible without impacting server performance. Goal of the Revamp The documentation change was prompted by the sheer size of the overview page. Adding more crawler information would have made it even larger. Therefore, the decision was made to subdivide the page into three topics so specific crawler content could keep growing, leaving room for more general information on the overview page. Creating dedicated sub-topic pages is a smart solution for better serving users. This is how the changelog describes the change: The documentation became very extensive, limiting the ability to expand content about Google’s crawlers and user-triggered fetchers. The reorganization included adding specific notes about the impact each crawler has on products, and a robots.txt snippet for each crawler to demonstrate using the user agent tokens. No significant changes were made to the content otherwise. The changelog downplays the changes by referring to it as a reorganization, as the crawler overview is significantly rewritten, alongside the creation of three brand-new pages. Although the content is largely the same, its division into sub-topics allows Google to continue adding more content to the new pages without further expanding the original overview. Now a true overview, the original page “Overview of Google crawlers and fetchers (user agents)” has more granular content on standalone pages. Google has published three new pages: 1. Common Crawlers 2. Special-Case Crawlers 3. User-Triggered Fetchers 1. Common Crawlers Common crawlers related primarily to GoogleBot, like the Google-InspectionTool, which uses the GoogleBot user agent, are covered here. All bots listed adhere to robots.txt rules. Documented Google crawlers include: – Googlebot – Googlebot Image – Googlebot Video – Googlebot News – Google StoreBot – Google-InspectionTool – GoogleOther – GoogleOther-Image – GoogleOther-Video – Google-CloudVertexBot – Google-Extended 2. Special-Case Crawlers These crawlers are associated with specific products and agreements with those product users, operated from IPs separate from GoogleBot’s. The list of special-case crawlers includes: – AdSense (User Agent for Robots.txt: Mediapartners-Google) – AdsBot (User Agent for Robots.txt: AdsBot-Google) – AdsBot Mobile Web (User Agent for Robots.txt: AdsBot-Google-Mobile) – APIs-Google (User Agent for Robots.txt: APIs-Google) – Google-Safety (User Agent for Robots.txt: Google-Safety) 3. User-Triggered Fetchers This page describes bots activated by user requests, such as Google Site Verifier, which operates under user requests, like a site on Google Cloud retrieving an external RSS feed. Generally, these fetchers disregard robots.txt rules. Google’s crawler’s general technical properties apply here as well. Covered User-triggered Fetchers: – Feedfetcher – Google Publisher Center – Google Read Aloud – Google Site Verifier Takeaway: Google’s overly comprehensive crawler overview page may have become less useful due to its size. The more concise overview now serves as an entry point from where users can delve into more specific subtopics. This change illustrates how to refresh a page that might be underperforming due to its comprehensiveness. Breaking a comprehensive page into standalone topics addresses specific user needs and potentially increases their usefulness in search results. This update doesn’t reflect changes in Google’s algorithm but showcases an update for enhancing documentation usability and future-proofing for content expansion. Explore Google’s Newly Organized Documentation: – Overview of Google crawlers and fetchers (user agents) – List of Google’s common crawlers – List of Google’s special-case crawlers – List of Google user-triggered fetchers

Google has undertaken a significant update of its crawler documentation. The primary overview page has been condensed, and the content has been divided into three distinct, focused pages. Despite the changelog minimizing the significance of these changes, the overhaul includes a brand new section and a comprehensive rewrite of the crawler overview page. This reorganization allows Google to enrich the information across all crawler pages, enhancing topical coverage.

What Changed?

While Google’s documentation changelog highlights only two changes, there is substantially more to it.

Changes include:

  • An updated user agent string for the GoogleProducer crawler.
  • Information on content encoding.
  • A new section about technical properties.

The new technical properties section provides previously unavailable information. No alterations have been made to crawler behavior; however, by reorganizing the content into three specific pages, Google enriches the crawler overview content while keeping it concise.

New content about content encoding (compression) is as follows:

“Google’s crawlers and fetchers support certain content encodings (compressions) such as gzip, deflate, and Brotli (br). Each Google user agent’s supported encodings are indicated in the Accept-Encoding header of each request they make.”

Additional details are provided about crawling over HTTP/1.1 and HTTP/2. Google’s aim is stated as crawling as many pages as possible without burdening website servers.

What Is the Goal of the Revamp?

The change was driven by the fact that the overview page had grown unwieldy. Incorporating further crawler information would enlarge it even more, thus a decision was made to create three subtopic-specific pages. This strategy facilitates the expansion of specific crawler content and allocates space for broader information on the overview page. Dividing subtopics into their own pages is an effective approach to better serve users.

Here’s how the documentation changelog describes the change:

“The documentation became lengthy, hindering our ability to expand the information on our crawlers and user-triggered fetchers.”

The changelog describes the updates as a reorganization, understating the significant rewrite of the crawler overview as well as the creation of three entirely new pages. While the core content largely remains the same, dividing it into sub-topics allows Google to enhance and grow the new pages without further inflating the original one. The original page, now truly an overview, directs users to more detailed standalone pages.

Google introduced three new pages:

  1. Common crawlers
  2. Special-case crawlers
  3. User-triggered fetchers

1. Common Crawlers

True to its name, this page covers common crawlers, such as those linked with GoogleBot, including the Google-InspectionTool, which utilizes the GoogleBot user agent. All listed bots adhere to robots.txt rules.

Documented Google crawlers include:

  • Googlebot
  • Googlebot Image
  • Googlebot Video
  • Googlebot News
  • Google StoreBot
  • Google-InspectionTool
  • GoogleOther
  • GoogleOther-Image
  • GoogleOther-Video
  • Google-CloudVertexBot
  • Google-Extended

2. Special-Case Crawlers

These are associated with specific products and are operated by agreements with users. They function from separate IP addresses than those of the standard GoogleBot crawler.

List of Special-Case Crawlers:

  • AdSense: User Agent for Robots.txt: Mediapartners-Google
  • AdsBot: User Agent for Robots.txt: AdsBot-Google
  • AdsBot Mobile Web: User Agent for Robots.txt: AdsBot-Google-Mobile
  • APIs-Google: User Agent for Robots.txt: APIs-Google
  • Google-Safety: User Agent for Robots.txt: Google-Safety

3. User-Triggered Fetchers

This page explains bots that are activated by user request and typically disregard robots.txt rules. These fetchers carry out fetching functions within a Google product based on user initiation. The general technical properties of Google’s crawlers are applicable to these fetchers as well.

Covered bots include:

  • Feedfetcher
  • Google Publisher Center
  • Google Read Aloud
  • Google Site Verifier

Takeaway:

The comprehensive nature of Google’s crawler overview page had potentially diminished its usefulness due to its extensive size. By restructuring it into focused standalone pages, the overview becomes an entry point from which users can explore specific subtopics. This strategy revamps a comprehensive page that may have underperformed by making it more applicable to particular user needs, which could enhance its effectiveness in search rankings. This change is reflective of a documentation update rather than an alteration in Google’s algorithm.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button